Here we import some packages that we'll need in various places. We'll also load all the variables we set in config.
In [1]:
!mkdir -p ~/agave
%cd ~/agave
!pip3 install --upgrade setvar
import re
import os
import sys
import json
from setvar import *
from time import sleep
# This cell enables inline plotting in the notebook
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
loadvar()
If you are running locally using Docker Compose, you will need to pull the ip and port of your reverse tunnel from the sandbox. Uncomment the following command, and enter below
If you specified the correct address to your Tenants API, you should be able to discover the tenants available for you through the CLI.
In [2]:
!tenants-list
Select the tenant you would like to use by setting the AGAVE_TENANT
environment variable. You may select either the name of the tenant, or the You may use the value of 0 to select the default tenant.
In [3]:
!tenants-init -f
In this next step we delete the client if it exists. Chances are, yours doesn't yet. We put this command here in case, for some reason, you want to re-create your client later on. If you delete the client you intend to create before you create it, no harm is done.
In [4]:
!clients-delete -u "$AGAVE_USERNAME" -p "$AGAVE_PASSWORD" $AGAVE_APP_NAME
In this step we create the client. Clients provide a way of encapsulating resources connected to a single project. Through the client, you will receive a token which you can use to run most of the Agave commands.
In [5]:
!clients-create -u $AGAVE_USERNAME -p "$AGAVE_PASSWD" -N $AGAVE_APP_NAME -S
Create the token for your client. You will, from this point on, use this token to run the remainder of the Agave commands in this tutorial.
In [6]:
!auth-tokens-create -u $AGAVE_USERNAME -p "$AGAVE_PASSWD"
In [7]:
if os.environ.get('USE_TUNNEL') == 'True':
# fetch the hostname and port of the reverse tunnel running in the sandbox
# so Agave can connect to our local sandbox
!echo $(ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null sandbox 'curl -s http://localhost:4040/api/tunnels | jq -r '.tunnels[0].public_url'') > ngrok_url.txt
!cat ngrok_url.txt | sed 's|^tcp://||'
!cat ngrok_url.txt | sed 's|^tcp://||' | sed -r 's#(.*):(.*)#\1#' > ngrok_host.txt
!cat ngrok_url.txt | sed 's|^tcp://||' | sed -r 's#(.*):(.*)#\2#' > ngrok_port.txt
# set the environment variables otherwise set when running in a training cluster
os.environ['VM_PORT'] = readfile('ngrok_port.txt').strip()
os.environ['VM_MACHINE'] = readfile('ngrok_host.txt').strip()
os.environ['AGAVE_SYSTEM_HOST'] = readfile('ngrok_host.txt').strip()
os.environ['AGAVE_SYSTEM_PORT'] = readfile('ngrok_port.txt').strip()
!echo "VM_PORT=$VM_PORT"
!echo "VM_MACHINE=$VM_MACHINE"
setvar("VM_IPADDRESS=$(getent hosts ${VM_MACHINE}|cut -d' ' -f1)")
Agave wants to know which place (or places) you want to store the data associated with your jobs. Here, we're going to set that up. Authentication to the storage machine will be through SSH keys. The key and public key files, however, contain newlines. To encode them in Json (the data format used by Agave), we will run the jsonpki command on each file. Next, we will store its contents in the environment for use by setvar.
In [8]:
!jsonpki --public ~/.ssh/id_rsa.pub > ~/.ssh/id_rsa.pub.txt
!jsonpki --private ~/.ssh/id_rsa > ~/.ssh/id_rsa.txt
In [9]:
os.environ["PUB_KEY"]=readfile("${HOME}/.ssh/id_rsa.pub.txt").strip()
os.environ["PRIV_KEY"]=readfile("${HOME}/.ssh/id_rsa.txt").strip()
In this next cell, we create the json file used to describe the storage machine.
In [10]:
writefile("${AGAVE_STORAGE_SYSTEM_ID}.txt","""{
"id": "${AGAVE_STORAGE_SYSTEM_ID}",
"name": "${MACHINE_NAME} storage (${MACHINE_USERNAME})",
"description": "The ${MACHINE_NAME} computer",
"site": "${AGAVE_SYSTEM_SITE_DOMAIN}",
"type": "STORAGE",
"storage": {
"host": "${AGAVE_SYSTEM_HOST}",
"port": ${AGAVE_SYSTEM_PORT},
"protocol": "SFTP",
"rootDir": "/",
"homeDir": "${AGAVE_STORAGE_HOME_DIR}",
"auth": {
"username" : "${MACHINE_USERNAME}",
"publicKey" : "${PUB_KEY}",
"privateKey" : "${PRIV_KEY}",
"type" : "SSHKEYS"
}
}
}
""")
Here, we tell Agave about the machine. You can re-run the previous cell and the next one if you want to change the definition of your storage machine.
In [11]:
!systems-addupdate -F ${AGAVE_STORAGE_SYSTEM_ID}.txt
Next we run the Agave command files-list
. This provides a check that we've set up the storage machine correctly.
In [12]:
!files-list -S ${AGAVE_STORAGE_SYSTEM_ID} ./ | head -5
You may not always wish to store your data on the same machine you run your jobs on. However, in this tutorial, we will assume that you do. The description for the execution machine is much like the storage machine. However, there are a few more pieces of information you'll need to provide. In this example, we are going to call commands directly on the host as opposed to using a batch queue scheduler. It is slightly simpler.
In [13]:
# Edit any parts of this file that you know need to be changed for your machine.
writefile("${AGAVE_EXECUTION_SYSTEM_ID}.txt","""
{
"id": "${AGAVE_EXECUTION_SYSTEM_ID}",
"name": "${MACHINE_NAME} (${MACHINE_USERNAME})",
"description": "The ${MACHINE_NAME} computer",
"site": "${AGAVE_SYSTEM_SITE_DOMAIN}",
"public": false,
"status": "UP",
"type": "EXECUTION",
"executionType": "CLI",
"scheduler" : "FORK",
"environment": null,
"scratchDir" : "${SCRATCH_DIR}",
"queues": [
{
"name": "none",
"default": true,
"maxJobs": 10,
"maxUserJobs": 10,
"maxNodes": 6,
"maxProcessorsPerNode": 6,
"minProcessorsPerNode": 1,
"maxRequestedTime": "00:30:00"
}
],
"login": {
"auth": {
"username" : "${MACHINE_USERNAME}",
"publicKey" : "${PUB_KEY}",
"privateKey" : "${PRIV_KEY}",
"type" : "SSHKEYS"
},
"host": "${AGAVE_SYSTEM_HOST}",
"port": ${AGAVE_SYSTEM_PORT},
"protocol": "SSH"
},
"maxSystemJobs": 50,
"maxSystemJobsPerUser": 50,
"storage": {
"host": "${AGAVE_SYSTEM_HOST}",
"port": ${AGAVE_SYSTEM_PORT},
"protocol": "SFTP",
"rootDir": "/",
"homeDir": "${AGAVE_STORAGE_HOME_DIR}",
"auth": {
"username" : "${MACHINE_USERNAME}",
"publicKey" : "${PUB_KEY}",
"privateKey" : "${PRIV_KEY}",
"type" : "SSHKEYS"
}
},
"workDir": "${AGAVE_STORAGE_WORK_DIR}"
}""")
In [14]:
!systems-addupdate -F ${AGAVE_EXECUTION_SYSTEM_ID}.txt
In [15]:
# Test to see if this worked...
!files-list -S ${AGAVE_EXECUTION_SYSTEM_ID} ./ | head -5
In [16]:
writefile("fork-wrapper.txt","""
#!/bin/bash
\${command}
""")
Using Agave commands, we make a directory on the storage server an deploy our wrapper file there.
In [17]:
!files-mkdir -S ${AGAVE_STORAGE_SYSTEM_ID} -N ${AGAVE_APP_DEPLOYMENT_PATH}
!files-upload -F fork-wrapper.txt -S ${AGAVE_STORAGE_SYSTEM_ID} ${AGAVE_APP_DEPLOYMENT_PATH}/
All agave applications require a test file. The test file is a free form text file which allows you to specify what resources you might need to test your application.
In [18]:
writefile("fork-test.txt","""
command=date
fork-wrapper.txt
""")
In [19]:
!files-mkdir -S ${AGAVE_STORAGE_SYSTEM_ID} -N ${AGAVE_APP_DEPLOYMENT_PATH}
!files-upload -F fork-test.txt -S ${AGAVE_STORAGE_SYSTEM_ID} ${AGAVE_APP_DEPLOYMENT_PATH}/
Like everything else in Agave, we describe our application with Json. We specifiy which machines the application will use, what method it will use for submitting jobs, job parameters and files, etc.
In [20]:
writefile("fork-app.txt","""
{
"name":"${AGAVE_USERNAME}-${MACHINE_NAME}-fork",
"version":"1.0",
"label":"Runs a command",
"shortDescription":"Runs a command",
"longDescription":"",
"deploymentSystem":"${AGAVE_STORAGE_SYSTEM_ID}",
"deploymentPath":"${AGAVE_APP_DEPLOYMENT_PATH}",
"templatePath":"fork-wrapper.txt",
"testPath":"fork-test.txt",
"executionSystem":"${AGAVE_EXECUTION_SYSTEM_ID}",
"executionType":"CLI",
"parallelism":"SERIAL",
"modules":[],
"inputs":[
{
"id":"datafile",
"details":{
"label":"Data file",
"description":"",
"argument":null,
"showArgument":false
},
"value":{
"default":"/dev/null",
"order":0,
"required":false,
"validator":"",
"visible":true
}
}
],
"parameters":[{
"id" : "command",
"value" : {
"visible":true,
"required":true,
"type":"string",
"order":0,
"enquote":false,
"default":"/bin/date",
"validator":null
},
"details":{
"label": "Command to run",
"description": "This is the actual command you want to run. ex. df -h -d 1",
"argument": null,
"showArgument": false,
"repeatArgument": false
},
"semantics":{
"label": "Command to run",
"description": "This is the actual command you want to run. ex. df -h -d 1",
"argument": null,
"showArgument": false,
"repeatArgument": false
}
}],
"outputs":[]
}
""")
In [21]:
!apps-addupdate -F fork-app.txt
Before we configure our notification, we need to create a requestbin to use. There are convenience commands to interact with requestbin built into the Agave CLI. We will use those to get our URL.
In [22]:
rburl = !requestbin-create
os.environ['REQUESTBIN_URL'] = rburl[0].strip()
Now that we have a URL to recieve webhooks from our job, Let's look at our job request. The way this job is configured, it will send the requestbin notifications for every job event until the job reaches a terminal state. For a full list of job events, please see http://docs.agaveplatform.org/#job-monitoring
In [23]:
writefile("job.txt","""
{
"name":"fork-command-1",
"appId": "${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0",
"executionSystem": "${AGAVE_EXECUTION_SYSTEM_ID}",
"archive": false,
"notifications": [
{
"url":"${REQUESTBIN_URL}?event=\${EVENT}&jobid=\${JOB_ID}",
"event":"*",
"persistent":"true"
}
],
"parameters": {
"command":"echo hello"
}
}
""")
Because the setvar() command can evalute $()
style bash shell substitutions, we will use it to submit our job. This will capture the output of the submit command, and allow us to parse it for the JOB_ID. We'll use the JOB_ID in several subsequent steps.
In [24]:
setvar("""
# Capture the output of the job submit command
OUTPUT=$(jobs-submit -F job.txt)
# Parse out the job id from the output
JOB_ID=$(echo $OUTPUT | cut -d' ' -f4)
""")
While the job is running, the requestbin you registered will receive webhooks from Agave every time a job event occurs. To monitor this in real time, evaluate the next cell an visit the printed url in your browser:
In [25]:
print ('%s?inspect'%os.environ['REQUESTBIN_URL'])
Of course, you can also monitor the job status by polling. Note that the notifications you receive via email and webhook are less wasteful of resources. However, we show you this for completeness.
In [26]:
for iter in range(20):
setvar("STAT=$(jobs-status $JOB_ID)")
stat = os.environ["STAT"]
sleep(5.0)
if stat == "FINISHED" or stat == "FAILED":
break
The jobs-history command provides you a record of the steps of what your job did. If your job fails for some reason, this is your best diagnostic.
In [27]:
!jobs-history ${JOB_ID}
This command shows you the job id's and status of the last 5 jobs you ran.
In [28]:
!jobs-list -l 5
This next command provides you with a list of all the files generated by your job. You can use it to figure out which files you want to retrieve with jobs-output-get.
In [29]:
!jobs-output-list --rich --filter=type,length,name ${JOB_ID}
Retrieve the standard output.
In [30]:
!jobs-output-get ${JOB_ID} fork-command-1.out
!cat fork-command-1.out
Retrieve the standard error output.
In [31]:
!jobs-output-get ${JOB_ID} fork-command-1.err
!cat fork-command-1.err
In [32]:
%%writefile runagavecmd.py
from setvar import *
from time import sleep
def runagavecmd(cmd,infile=None):
setvar("REMOTE_COMMAND="+cmd)
setvar("REQUESTBIN_URL=$(requestbin-create)")
print("")
print(" ** QUERY STRING FOR REQUESTBIN **")
print('%s?inspect'%os.environ['REQUESTBIN_URL'])
print("")
# The input file is an optional parameter, both
# to our function and to the Agave application.
if infile == None:
setvar("INPUTS={}")
else:
setvar('INPUTS={"datafile":"'+infile+'"}')
setvar("JOB_FILE=job-remote-$PID.txt")
# Create the Json for the job file.
writefile("$JOB_FILE","""
{
"name":"fork-command-1",
"appId": "${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0",
"executionSystem": "${AGAVE_EXECUTION_SYSTEM_ID}",
"archive": false,
"notifications": [
{
"url":"${REQUESTBIN_URL}?event=\${EVENT}&jobid=\${JOB_ID}",
"event":"*",
"persistent":"true"
}
],
"parameters": {
"command":"${REMOTE_COMMAND}"
},
"inputs":${INPUTS}
}""")
# Run the job and capture the output.
setvar("""
# Capture the output of the job submit command
OUTPUT=$(jobs-submit -F $JOB_FILE)
# Parse out the job id from the output
JOB_ID=$(echo $OUTPUT | cut -d' ' -f4)
""")
# Poll and wait for the job to finish.
for iter in range(80): # Excessively generous
setvar("STAT=$(jobs-status $JOB_ID)")
stat = os.environ["STAT"]
sleep(5.0)
if stat == "FINISHED" or stat == "FAILED":
break
# Fetch the job output from the remote machine
setvar("CMD=jobs-output-get ${JOB_ID} fork-command-1.out")
os.system(os.environ["CMD"])
print("All done! Output follows.")
# Load the output into memory
output=readfile("fork-command-1.out")
print("=" * 70)
print(output)
In [33]:
import runagavecmd as r
import imp
imp.reload(r)
Out[33]:
In [34]:
r.runagavecmd("lscpu")
List the users and the permssions they have to look at the given job.
In [35]:
!jobs-pems-list ${JOB_ID}
In [36]:
# now pair off with your neighbor and both of you share your job with them.
# For now, just give read access
!jobs-pems-update -u training002 -p READ ${JOB_ID}
In [37]:
# Now let's see if we can see our neighbor's job
# Now let's see if we can see our neighbor's job
shared_job = !jobs-search --filter=id -l 1 owner.neq=${AGAVE_USERNAME}
os.environ['SHARED_JOB_ID'] = shared_job[0]
print(os.environ['SHARED_JOB_ID'])
Permissions are just that, permitting someone to do something. You said your neighbor could view your job. Let's see what that means.
In [38]:
# You already searched for the job and found it, so you should be able to lis
# an view the details
!jobs-list $SHARED_JOB_ID
In [39]:
# You should also be able to view the history. Here we'll just return the last few
# events. Notice the history event showed up history event
!jobs-history --limit 3 --order desc $SHARED_JOB_ID
In [40]:
# You can also view their job output
!jobs-output-list -L $SHARED_JOB_ID
In [41]:
# What if we no longer want to see the job. Let's delete it.
!jobs-delete $SHARED_JOB_ID
Doah! We can't delete the shared job because we weren't granted write permission.
In [42]:
# Let's grant write access and see what we can do
!jobs-pems-update -u training002 -p READ_WRITE ${JOB_ID}
In [43]:
# Now let's see if we can delete the shared job
!jobs-delete $SHARED_JOB_ID
In [44]:
# Wait, now we don't have anything to work with.
# No worries. Agave doens't really delete anything. Your job is still there
# We just need to restore it.
!jobs-restore $SHARED_JOB_ID
In [45]:
# Now let's try to rerun the job
!jobs-resubmit $SHARED_JOB_ID
In [46]:
# Well, what app did they use in the job?
shared_job = !jobs-list -v --filter=executionSystem,appId $SHARED_JOB_ID | jq -r '. | [ .executionSystem , .appId] | .[]'
print(shared_job)
os.environ['SHARED_JOB_APP'] = shared_job[1]
os.environ['SHARED_JOB_SYSTEM'] = shared_job[0]
In [47]:
# Hmm, do we have access to the app?
! apps-pems-list $SHARED_JOB_APP
In [48]:
# Oh, we don't have permission to even view the app. Guess our job permissions
# don't extend to the application. Let's be a good neighbor and share our apps
# with each other
! apps-pems-update -u training002 -p READ "${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0"
In [49]:
# Now do we have access to the app?
! apps-pems-list $SHARED_JOB_APP
In [50]:
# Score, but wait, do I need execute to run? We should granb that too.
# Hmm, do we have access to the app?
! apps-pems-update -u training002 -p EXECUTE "${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0"
In [51]:
# Now do we have access to the app?
! apps-pems-list $SHARED_JOB_APP
In [52]:
# I guess permissions aren't hierachical. Now i can execute it (I think), but I can't
# read it. How aabout we grant read_execute instead
! apps-pems-update -u training002 -p READ_EXECUTE "${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0"
In [53]:
# Now do we have access to the app?
! apps-pems-list $SHARED_JOB_APP
In [54]:
# So now we can rerun our neighbor's job, right
!jobs-resubmit -v $SHARED_JOB_ID
In [55]:
# drat. why can't we run now? Do we have system access?
!systems-roles-list $SHARED_JOB_SYSTEM
In [56]:
# ok, let's skip to the end. we'll just realize we should grant a user rather than guest role
# to the system
!systems-roles-addupdate -u training002 -r USER $AGAVE_EXECUTION_SYSTEM_ID
In [57]:
# that should work, right?
!systems-roles-list $SHARED_JOB_SYSTEM
In [58]:
# So can we run the job now?
resubmitted_job_id = ! jobs-resubmit -v --filter=id $SHARED_JOB_ID | jq -r '.id'
os.environ['RESUBMITTED_JOB_ID'] = resubmitted_job_id[0]
In [59]:
# yay. wait, who owns the data?
print (resubmitted_job_id[0])
! jobs-pems-list $RESUBMITTED_JOB_ID
In [60]:
# mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine
# kill it, we're moving on.
! jobs-stop $RESUBMITTED_JOB_ID
In [61]:
# we can also share data a few ways
job_output_url = ! jobs-output-list -v --filter=_links $JOB_ID fork-command-1.out | jq -r '.[0]._links.self.href'
os.environ['JOB_OUTPUT_URL'] = job_output_url[0]
In [62]:
postit_url = ! postits-create -m 3 -l 86400 -V $JOB_OUTPUT_URL | jq -r '.result._links.self.href'
In [63]:
# click on the link a few times to see it working.
print (postit_url[0])
In [64]:
# you can also share your data via the files api
# let's share the job directory with each other
job_path = ! jobs-list -v $JOB_ID | jq -r '.outputPath'
os.environ['JOB_OUTPUT_FOLDER_PATH'] = job_path[0]
! files-pems-update -u training002 -p read -S $AGAVE_EXECUTION_SYSTEM_ID $JOB_OUTPUT_FOLDER_PATH/fork-command-1.out
In [65]:
!jobs-delete $JOB_ID
In [69]:
!echo http://togo.agaveplatform.org/app/#/apps/${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0/run
In [ ]: